Python S9 Seaborn

My Course Notes and Code

These are my notes from the Jose Portilla's Udemy course available here.

I'm focusing on the section 9 of the course, which deals with Seaborn.


S9V47 Intro to Seaborn

  • Statistical plotting library
    • Built on top of Matplotlib
  • Beautiful default styles
  • Designed to work well with Pandas DataFrame objects

Useful resources:

https://seaborn.pydata.org/examples/index.html

https://seaborn.pydata.org/api.html


S9V48 Distribution Plots

In [ ]:
import seaborn as sns
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

tips = sns.load_dataset('tips') # one of sns built-in datasets
tips.head()
Out[ ]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Histogram

In [ ]:
sns.set_style("darkgrid")                                # nice, ggplot2-like :)

# from IPython.display import set_matplotlib_formats       # Change the default image format to a vector format
# set_matplotlib_formats('svg')                            # https://blakeaw.github.io/2020-05-25-improve-matplotlib-notebook-inline-res/

sns.set(rc={"figure.dpi":100, 'savefig.dpi':100})          # https://blakeaw.github.io/2020-05-25-improve-matplotlib-notebook-inline-res/
sns.set_context('notebook')

sns.distplot(tips['total_bill'], kde = False) # removing KDE - kernel density estimation
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cc4c66c8>
In [ ]:
sns.distplot(tips['total_bill'], kde = False, bins = 30)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cc55b948>

Joint Plot

In [ ]:
sns.jointplot(x = 'total_bill', y = 'tip', data = tips) # `kind = 'scatter'`
Out[ ]:
<seaborn.axisgrid.JointGrid at 0x1f6cc606f88>
In [ ]:
sns.jointplot(x = 'total_bill', y = 'tip', data = tips, kind = 'hex')
Out[ ]:
<seaborn.axisgrid.JointGrid at 0x1f6cc606348>
In [ ]:
pl = sns.jointplot(x = 'total_bill', y = 'tip', data = tips, kind = 'reg',  color='royalblue')
pl.annotate(stats.pearsonr) # import scipy.stats as stats
C:\Users\PC\anaconda3\lib\site-packages\seaborn\axisgrid.py:1848: UserWarning: JointGrid annotation is deprecated and will be removed in a future release.
  warnings.warn(UserWarning(msg))
Out[ ]:
<seaborn.axisgrid.JointGrid at 0x1f6cccc5548>
In [ ]:
pl = sns.jointplot(x = 'total_bill', y = 'tip', data = tips, kind = 'kde')
pl.annotate(stats.pearsonr)
C:\Users\PC\anaconda3\lib\site-packages\seaborn\axisgrid.py:1848: UserWarning: JointGrid annotation is deprecated and will be removed in a future release.
  warnings.warn(UserWarning(msg))
Out[ ]:
<seaborn.axisgrid.JointGrid at 0x1f6cccdc848>

Pairplot

In [ ]:
sns.pairplot(tips, hue = 'sex', palette = 'coolwarm')
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x1f6ce4c7d08>

Rugplot

In [ ]:
sns.rugplot(tips['total_bill'])
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cec78048>
In [ ]:
sns.distplot(tips['total_bill'], kde = False)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6ced00b88>

Kernel Density Estimation (KDE) plots

KDE plots replaces every single observation with a Gaussian (Normal) distribution centered around that value. Then, the final line is obtained by summing up these distributions centered around each datapoint.

  • One does not need to use Gaussian kernels for plotting KDE plots.
    • Gaussian kernel is Seaborne's default
  • Triangular kernel...
  • Cosine kernel...

Great Youtube link for understanding how it words.

Kernel Density Estimation is estimating the probability density function.

The area under the curve is 1, and the probability of a value being between x1 and x2 is the area under the curve between those two points.

In [ ]:
sns.kdeplot(tips['total_bill'], shade = True)
sns.rugplot(tips['total_bill'])
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6ce852248>
  • Lower bw -> higher variance of the KDE
  • Higher bw -> lower variance of the KDE
In [ ]:
sns.kdeplot(tips['total_bill'], bw = 20, label = 'bw = 20')
sns.kdeplot(tips['total_bill'], bw = 10, label = 'bw = 10')
sns.kdeplot(tips['total_bill'], bw = 5, label = 'bw = 5')
sns.kdeplot(tips['total_bill'], bw = 1, label = 'bw = 1')
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6ced5c388>
In [ ]:
# Understanding KDE plots step-by-step

dataset = np.random.randn(25) # Create dataset
dataset
Out[ ]:
array([ 0.12674025,  1.18975082,  0.13624467, -0.91246802, -1.46861517,
       -0.71418153, -0.83815175, -0.88399773,  0.03658769,  0.53362119,
       -1.72789312,  1.42415325,  0.60416016, -0.81851151,  1.37409697,
       -0.37803946,  0.55429909, -0.48998125,  0.61646879, -0.58932645,
        0.85023715,  0.44111352,  0.71163379, -0.83123673, -0.76151364])
In [ ]:
# Set up the x-axis for the plot
x_min = dataset.min() - 2
x_max = dataset.max() + 2

# 100 equally spaced points from x_min to x_max
x_axis = np.linspace(x_min,x_max,100)
In [ ]:
# Set up the bandwidth, using the Silverman method:

bandwidth = ((4*dataset.std()**5)/(3*len(dataset)))**.2

# Create an empty kernel list
kernel_list = []

# Plot each basis function
for data_point in dataset:
    
    # Create a kernel for each point and append to list # I don't really understand this code
    kernel = stats.norm(data_point, bandwidth).pdf(x_axis) # Probability Distribution Function
    kernel_list.append(kernel)
    
    #Scale for plotting
    kernel = kernel / kernel.max()
    kernel = kernel * .4
    plt.plot(x_axis, kernel, color = 'grey', alpha = 0.5)

plt.ylim(0, 1)
Out[ ]:
(0, 1)
In [ ]:
# To get the kde plot we can sum these basis functions.

# Plot the sum of the basis function
sum_of_kde = np.sum(kernel_list, axis = 0)

# Plot figure
fig = plt.plot(x_axis, sum_of_kde, color='indianred')

# Add the initial rugplot
sns.rugplot(dataset, c = 'indianred')

# Get rid of y-tick marks
plt.yticks([])

# Set title
plt.suptitle("Sum of the Basis Functions")
Out[ ]:
Text(0.5, 0.98, 'Sum of the Basis Functions')

Bivariate KDE plots

This is based on the forementioned Youtube video.

In [ ]:
cars = sns.load_dataset('mpg').dropna()
cars.info() # also useful: `cars.shape`
<class 'pandas.core.frame.DataFrame'>
Int64Index: 392 entries, 0 to 397
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           392 non-null    float64
 1   cylinders     392 non-null    int64  
 2   displacement  392 non-null    float64
 3   horsepower    392 non-null    float64
 4   weight        392 non-null    int64  
 5   acceleration  392 non-null    float64
 6   model_year    392 non-null    int64  
 7   origin        392 non-null    object 
 8   name          392 non-null    object 
dtypes: float64(4), int64(3), object(2)
memory usage: 30.6+ KB
In [ ]:
cars.head()
Out[ ]:
mpg cylinders displacement horsepower weight acceleration model_year origin name
0 18.0 8 307.0 130.0 3504 12.0 70 usa chevrolet chevelle malibu
1 15.0 8 350.0 165.0 3693 11.5 70 usa buick skylark 320
2 18.0 8 318.0 150.0 3436 11.0 70 usa plymouth satellite
3 16.0 8 304.0 150.0 3433 12.0 70 usa amc rebel sst
4 17.0 8 302.0 140.0 3449 10.5 70 usa ford torino
In [ ]:
sns.scatterplot(cars['horsepower'], cars['mpg'])
sns.kdeplot(cars['horsepower'], cars['mpg'], alpha = 0.5)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cf235708>
In [ ]:
sns.kdeplot(cars['horsepower'], cars['mpg'], n_levels = 20)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cf2c2b08>
In [ ]:
sns.set_style("whitegrid") 
sns.kdeplot(cars['horsepower'], cars['mpg'], 
    n_levels = 20, 
    cmap= 'Blues',
    shade = True)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cf32b5c8>
In [ ]:
sns.set_style("whitegrid") 
sns.kdeplot(cars['horsepower'], cars['mpg'], 
    n_levels = 20, 
    cmap= 'Blues',
    shade = True, 
    shade_lowest = False)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cf3b3e48>
In [ ]:
sns.kdeplot(cars['horsepower'], cars['mpg'], 
    n_levels = 20, 
    cmap = 'Blues',
    shade = True, 
    shade_lowest = False, 
    cbar = True)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6cf43f248>
In [ ]:
cyl_4 = cars[cars.cylinders == 4]
cyl_8 = cars[cars.cylinders == 8]

plt.figure(figsize = (8, 6))

sns.kdeplot(cyl_4.horsepower, cyl_4.mpg,
            cmap="Blues", shade = True, shade_lowest = False)
sns.kdeplot(cyl_8.horsepower, cyl_8.mpg,
            cmap="Reds", shade=True, shade_lowest=False)

plt.xlabel('Horsepower', fontsize = 14)
plt.ylabel('Miles per Gallon (MPG)', fontsize = 14)

# plt.annotate(): (s: str, xy: Tuple[float, float], *args: Any, **kwargs: Any)
# In order to understand this function better, I wrote some code in the next cell

plt.annotate("4 Cylinders", (105, 32), color = 'blue', fontsize = 16, fontweight = 'bold') 
plt.annotate("8 Cylinders", (190, 18), color = 'red', fontsize = 16, fontweight = 'bold');
In [ ]:
plt.figure(figsize = (8, 6))
plt.annotate('0, 0', (0, 0), color = 'green', fontsize = 10)
plt.annotate('0.2 , 0.2', (0.2 , 0.2), color = 'green', fontsize = 10) 
plt.annotate('0.5 , 0.5', (0.5 , 0.5), color = 'green', fontsize = 10) 
plt.annotate('0.2 , 0.5', (0.2 , 0.5), color = 'green', fontsize = 10) 
plt.annotate('0.5, 0.2', (0.5 , 0.2), color = 'green', fontsize = 10) 
plt.annotate('1, 1', (1, 1), color = 'green', fontsize = 10)

# Since this plot ranges from 0 to 1 on both the X and the Y axis, we locate annotations by 
# referring to coordinates that lie between 0 and 1
Out[ ]:
Text(1, 1, '1, 1')

S9V49 Categorical Plots

In [ ]:
tips.head()
Out[ ]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Bar Plot

In [ ]:
sns.set_style('darkgrid')

sns.barplot(tips['sex'], tips['total_bill']) # by default, we're looking at mean total_bill per gender
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6ceee1f88>
In [ ]:
sns.barplot(x = 'sex', y = 'total_bill', data = tips, estimator = np.std)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d057a5c8>

Count Plot

In [ ]:
sns.countplot(x = 'sex', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d05cf148>

Boxplots

A boxplot is a standardized way of displaying the dataset based on the five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.

  • Minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers
  • Maximum (Q4 or 100th percentile): the highest data point in the data set excluding any outliers
  • Median (Q2 or 50th percentile): the middle value in the data set
  • First quartile (Q1 or 25th percentile): also known as the lower quartile qn(0.25), it is the median of the lower half of the dataset.
  • Third quartile (Q3 or 75th percentile): also known as the upper quartile qn(0.75), it is the median of the upper half of the dataset.[7]
In [ ]:
sns.boxplot('day', 'total_bill', hue = 'smoker', data = tips) # `hue` is, of course, optional
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d062d548>
In [ ]:
sns.boxplot('day', 'total_bill', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d067d248>
In [ ]:
inf = tips[tips['day'] == 'Thur']['total_bill'].describe()
inf
Out[ ]:
count    62.000000
mean     17.682742
std       7.886170
min       7.510000
25%      12.442500
50%      16.200000
75%      20.155000
max      43.110000
Name: total_bill, dtype: float64
In [ ]:
IQR = inf['75%'] - inf['25%']

# useful guide for printing: https://www.delftstack.com/howto/python/python-print-string-and-variable/

print(f"IQR is: {IQR}\n1.5 * IQR = {1.5 * IQR}\n3Q + (1.5 * IQR) = {inf['75%'] + (1.5 * IQR)}")
IQR is: 7.712500000000002
1.5 * IQR = 11.568750000000003
3Q + (1.5 * IQR) = 31.723750000000003
  • In this case, the maximum value in this data set is $43.1.

  • 1.5 IQR above the third quartile is $31.72.

The maximum is greater than 1.5 IQR plus the third quartile, so the maximum is an outlier.

Therefore, the upper whisker is drawn at the greatest value smaller than 1.5 IQR above the third quartile.

In [ ]:
sns.boxplot('day', 'total_bill', hue = 'smoker', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0ac0348>

Violin-plots

In [ ]:
sns.violinplot('day', 'total_bill', hue = 'smoker', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0bd7d08>
In [ ]:
sns.violinplot('day', 'total_bill', data = tips, hue = 'smoker', split = True)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0c8aa88>
In [ ]:
sns.boxplot('day', 'total_bill', hue = 'smoker', data = tips)
sns.violinplot('day', 'total_bill', hue = 'smoker', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0d4d488>

Strip-plot

In [ ]:
sns.stripplot(x = 'day', y = 'total_bill', data = tips) # default: `jitter = True`
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0ecb0c8>
In [ ]:
sns.stripplot(x = 'day', y = 'total_bill', data = tips, jitter = False)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0f17708>
In [ ]:
sns.stripplot(x = 'day', y = 'total_bill', data = tips, hue = 'sex', dodge = True)
# The `split` parameter has been renamed to `dodge`.
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0f7a388>

Swarm-plot

In [ ]:
sns.swarmplot(x = 'day', y = 'total_bill', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d0fed188>
In [ ]:
sns.violinplot(x = 'day', y = 'total_bill', data = tips)
sns.swarmplot(x = 'day', y = 'total_bill', data = tips, color = 'black')
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d104d188>

catplot - the most general method for plotting categorical data

In [ ]:
sns.catplot(x = 'day', y = 'total_bill', data = tips, kind = 'bar')
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d209c708>

S9V50 - Matrix Plots

Heatmap

In [ ]:
# tips = sns.load_dataset('tips')
flights = sns.load_dataset('flights')

tips.head()
Out[ ]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
In [ ]:
flights.head()
Out[ ]:
year month passengers
0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121
In [ ]:
tc = tips.corr()
tc
Out[ ]:
total_bill tip size
total_bill 1.000000 0.675734 0.598315
tip 0.675734 1.000000 0.489299
size 0.598315 0.489299 1.000000
In [ ]:
sns.heatmap(tc, cmap = 'BuPu', annot = True) # https://python-graph-gallery.com/92-control-color-in-seaborn-heatmaps
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d211fe88>
In [ ]:
flights
Out[ ]:
year month passengers
0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121
... ... ... ...
139 1960 August 606
140 1960 September 508
141 1960 October 461
142 1960 November 390
143 1960 December 432

144 rows × 3 columns

In [ ]:
fp = flights.pivot(index = 'month', columns = 'year', values = 'passengers')
fp
Out[ ]:
year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
month
January 112 115 145 171 196 204 242 284 315 340 360 417
February 118 126 150 180 196 188 233 277 301 318 342 391
March 132 141 178 193 236 235 267 317 356 362 406 419
April 129 135 163 181 235 227 269 313 348 348 396 461
May 121 125 172 183 229 234 270 318 355 363 420 472
June 135 149 178 218 243 264 315 374 422 435 472 535
July 148 170 199 230 264 302 364 413 465 491 548 622
August 148 170 199 242 272 293 347 405 467 505 559 606
September 136 158 184 209 237 259 312 355 404 404 463 508
October 119 133 162 191 211 229 274 306 347 359 407 461
November 104 114 146 172 180 203 237 271 305 310 362 390
December 118 140 166 194 201 229 278 306 336 337 405 432
In [ ]:
cmap = sns.cm.rocket_r          # reversing the color scheme
sns.heatmap(fp, cmap = cmap)    # https://stackoverflow.com/questions/47461506/how-to-invert-color-of-seaborn-heatmap-colorbar
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d21d2fc8>
In [ ]:
sns.heatmap(fp, cmap = 'magma', linecolor = 'white', linewidths = '0.3')
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6d22860c8>

Cluster-map

In [ ]:
sns.clustermap(fp, cmap = 'coolwarm')
Out[ ]:
<seaborn.matrix.ClusterGrid at 0x1f6d233aec8>
In [ ]:
sns.clustermap(fp, cmap = 'coolwarm', standard_scale = 1)
Out[ ]:
<seaborn.matrix.ClusterGrid at 0x1f6d228b1c8>

S9V51 Grids

sns.PairGrid

In [ ]:
iris = sns.load_dataset('iris')
iris.head()
Out[ ]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
In [ ]:
iris['species'].unique()
Out[ ]:
array(['setosa', 'versicolor', 'virginica'], dtype=object)
In [ ]:
sns.pairplot(iris)
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x1f6d3156c88>
In [ ]:
sns.pairplot(iris, hue = 'species', palette = 'hls') # https://seaborn.pydata.org/tutorial/color_palettes.html
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x1f6d3fa4e88>
In [ ]:
g = sns.PairGrid(iris)
g.map(sns.scatterplot)
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x1f6d4d16d08>
In [ ]:
g = sns.PairGrid(iris)
g.map_diag(sns.distplot)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x1f6d4d6ee48>

sns.FacetGrid

In [ ]:
tips.head()
Out[ ]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
In [ ]:
g = sns.FacetGrid(data = tips, col = 'time', row = 'smoker')
g.map(sns.distplot, 'total_bill')
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d76b8b48>
In [ ]:
g = sns.FacetGrid(data = tips, col = 'time', row = 'smoker')
g.map(plt.scatter, 'total_bill', 'tip').add_legend()
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d7f6ec08>

sns.JointGrid

In [ ]:
g = sns.JointGrid(x= 'total_bill', y = 'tip', data = tips)
In [ ]:
g = sns.JointGrid(x = 'total_bill', y = 'tip', data = tips)
g = g.plot(sns.regplot, sns.distplot)

S9V52 Regression Plots

In [ ]:
tips.head()
Out[ ]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
In [ ]:
sns.lmplot(x = 'total_bill', y = 'tip', data = tips)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d8751808>
In [ ]:
sns.lmplot(x = 'total_bill', y = 'tip', data = tips, hue = 'sex')
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d87cc3c8>
In [ ]:
sns.lmplot(x = 'total_bill', y = 'tip', data = tips, hue = 'sex', markers = ['o', 'v']) # matplotlib
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d8ca92c8>
In [ ]:
sns.lmplot(x = 'total_bill', y = 'tip', data = tips, hue = 'sex', markers = ['o', 'v'],
           scatter_kws = {'s' : 100} ) # direct call to matplotlib
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d8cdfa88>
In [ ]:
sns.set_context('paper', font_scale = 2)
sns.lmplot(x = 'total_bill', y = 'tip', data = tips, col = 'day', row = 'smoker', 
           hue='sex', palette = 'coolwarm')
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6d8d5d488>
In [ ]:
sns.set_context('paper', font_scale = 2)
sns.lmplot(x = 'total_bill', y = 'tip', data = tips, col = 'sex', row = 'time', aspect = 3, height = 3) # ratio between w & h
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6dac031c8>

S9V53 Style and Color

In [ ]:
sns.set_style('ticks')        # None, or one of {darkgrid, whitegrid, dark, white, ticks}
sns.countplot(x = 'sex', data = tips)
sns.despine(left = True, bottom = True)  # top, right, left, bottom : boolean, optional
In [ ]:
plt.figure(figsize = (12, 3))          # Matplotlib works in combination with Seaborn
sns.countplot(x = 'sex', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6db164448>
In [ ]:
# help(sns.set_context)
 
sns.set_context('poster', font_scale = 3) #  None, or one of {paper, notebook, talk, poster} # `font_scale = 3` (3 times the default)
sns.countplot(x = 'sex', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6db389208>
In [ ]:
sns.set_context('notebook', font_scale = 1) #  None, or one of {paper, notebook, talk, poster} # `font_scale = 3` (3 times the default)
sns.countplot(x = 'sex', data = tips)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6db21f1c8>
In [ ]:
sns.lmplot(x = 'total_bill', y = 'tip', data = tips, hue = 'sex', palette = 'seismic')
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6db283a88>

Exercises

In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
sns.set_context('notebook', font_scale = 1)
sns.set(rc={"figure.dpi":100, 'savefig.dpi':100}) 

titanic = sns.load_dataset('titanic')
titanic.head()
Out[ ]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
In [ ]:
pl = sns.jointplot(x = 'fare', y = 'age', data = titanic)

import scipy.stats as stats
pl.annotate(stats.pearsonr)
C:\Users\PC\anaconda3\lib\site-packages\seaborn\axisgrid.py:1848: UserWarning: JointGrid annotation is deprecated and will be removed in a future release.
  warnings.warn(UserWarning(msg))
Out[ ]:
<seaborn.axisgrid.JointGrid at 0x1f6db308348>
In [ ]:
sns.distplot(titanic['fare'], kde = False)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6db2fb088>
In [ ]:
sns.boxplot(x = 'class', y = 'age', data = titanic)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6dc8cd608>
In [ ]:
sns.swarmplot(x = 'class', y = 'age', data = titanic)
C:\Users\PC\anaconda3\lib\site-packages\seaborn\categorical.py:1326: RuntimeWarning: invalid value encountered in less
  off_low = points < low_gutter
C:\Users\PC\anaconda3\lib\site-packages\seaborn\categorical.py:1330: RuntimeWarning: invalid value encountered in greater
  off_high = points > high_gutter
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6dc915d88>
In [ ]:
titanic.columns

sns.countplot(x = 'sex', data = titanic)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6dc9c2888>
In [ ]:
tc = titanic.corr()

sns.heatmap(tc, cmap = 'coolwarm')
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f6dca0d908>
In [ ]:
fg = sns.FacetGrid(data = titanic, col = 'sex', hue = 'sex')
fg.map(sns.distplot, 'age', kde = False, bins = 10)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6dcac2d48>
In [ ]:
fg = sns.FacetGrid(data = titanic, col = 'sex') # This is Jose's solution
fg.map(plt.hist, 'age')
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x1f6dcaaa648>
In [ ]:
jupyter nbconvert file.ipynb -- to html_toc
  File "<ipython-input-172-ee6cd37d3209>", line 1
    jupyter nbconvert file.ipynb -- to html_toc
                    ^
SyntaxError: invalid syntax
In [ ]: